Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: kristinmak Date: Dec 24, 2015 9:49pm
Forum: texts Subject: Re: Quality control of scanned books

If I may add, and this is probably of no real help to EbbeHove, the files uploaded by user "tpb" via Google are absolutely abismal. Images never copied, some are so pathetic they are unreadable. From that specific uploader, I would say the majority of the files are total garbage. Speaking of "letting the sloppy ones through", there has been a tremendous amount of spam or malware type of nonsense - such as a token "file" disguised as a useful book (made up of say, one page of general text) with instructions to visit the uploader's personal website! I have reported several of these, and they are STILL on archive.org. I noticed this abuse has just started to infiltrate archive.org about one year ago- it was not like this before. *Something* has changed. This is not a mere "well more people have discovered archive and we cannot keep up with it all" is no excuse, as I said, the abusers have been pointed out and reported and the malware or advertisements are still available - and it seems suspiciously notable this major change occurred "all at once" so to speak, and nothing has been done about it.

EbbeHove apologies but I wanted to insert another complaint- and make others aware of this, who may have noticed it but not spoken up about it.

Reply to this post
Reply [edit]

Poster: EbbeHove Date: Dec 25, 2015 8:55am
Forum: texts Subject: Re: Quality control of scanned books

Hi kristinmak

Looking at the amount of uploads I can not help thinking that archive.org is slowly drowning in irrelevant material. Imagine visiting a physical archive where the items you pick from the shelves have no proper titles or descriptions and turn out to be anything but texts. Imagine if some of the items are black boxes that you have to put your hand in, to experience what is inside (meaning downloading unknown content). Welcome to archive.org. Your complaint is understood and shared, kristinmak.

What can be done? Well, this organization with only 140 employees surely needs help, and I do not mean money or volunteers to "welcome and seat guests", I mean editors. Voluntary editors who can categorize material and quarantine or delete the stuff that is illegal or just spam.

Ebbe

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Jan 2, 2016 12:47pm
Forum: texts Subject: Re: Quality control of scanned books

hi,

there is a beta flagging system in place on each details page. if you feel and item should be flagged for review feel free to do just that. it may take a bit of time to sort through all flagged items since the system is not yet fully operational but marking the items for review is a first step.

Reply to this post
Reply [edit]

Poster: EbbeHove Date: Jan 3, 2016 11:27pm
Forum: texts Subject: Re: Quality control of scanned books

Hello mr. Kaplan
Well, flagging content really just seems like barking at the moon.
Why not let your regular users do something more productive?
For instance, since December 25th, the Internet Archive has received 43 "Text" items with Language=Danish. Problem is: Not one of them is in Danish. Most of them are (as far as I can see) music files or bits of software, and some are videos. I do not want to flag them and wait and wait. I want to be able to remove the faulty language field and the faulty text field so future searches by anyone will not be polluted by this material. But the Internet Archive will not allow this. Do you have any plans to review your attitude on external editors?
Ebbe Hove

Reply to this post
Reply [edit]

Poster: xensyria Date: Jan 7, 2016 6:00am
Forum: texts Subject: Re: Quality control of scanned books

Firstly, IA is great, and while I sometimes share the frustration mentioned above, the number of times it's helped without problem massively outweighs this.

I agree that IA has the potential for at least as strong a community as, say, Wikimedia Commons, if the platform allowed for it.

Are there any consequences that we haven't thought of that might make them reluctant to do so? (e.g. might IA become more copyright conscious along the lines of Wikimedia, rather than its current more live and let live approach, which, incidentally, seems more in line with current DMCA legislation than Wiki).

Reply to this post
Reply [edit]

Poster: EbbeHove Date: Jan 9, 2016 12:37am
Forum: texts Subject: Re: Quality control of scanned books

Hello xensyria
Glad we agree on the need for a community.
For IA to embrace the idea, they could start with a small group of volunteer editors, with limited editing rights, just to see that it works. Personally I would be happy to be able to edit the "language" parameter, to add and remove "topics" and to be able to reclassify material (to move items erroneously marked as "text"). I see no use of the right to delete material - this is just to clean up categories so searches do not get cluttered by loads of irrelevant files.
Let us hope IA has the courage to trust their users.
Ebbe

Reply to this post
Reply [edit]

Poster: xensyria Date: Jan 9, 2016 3:46pm
Forum: texts Subject: Re: Quality control of scanned books

I would love to see a strong community here, and there are intermediary steps as you suggest (and ways to manage the risks of incompetent/rogue editors:) which could either be stored as a list of edits waiting to go live (pending changes), or as live edits that need to be checked (patrolling). It would basically be the current flagging system, but without the person checking the flag having to do the work!

But I've been thinking a bit more about the IA's perspective. I guess community curation isn't part of their vision of an information archive; the resource itself looks to be the focus.

The model of uploaders (some very prolific with, I think, more powers surrounding things like collections etc.) adding the info as they go has worked to get it to where it is.

There do seem to be tentative moves towards this to address the problems you mention (like the flagging system), but it does seem to be just that: trying to solve a problem rather than embracing an opportunity.

Perhaps there's also a bottleneck there in terms of resources: there might not be the funds to divert to redesign the platform to enable a community to flourish.

I'm not sure we can influence them to go for this either, other than by showing that we're here if they ever decide to go down this route. Would be great to hear any official thoughts on it though!

This post was modified by xensyria on 2016-01-09 23:46:00

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Dec 25, 2015 8:35am
Forum: texts Subject: Re: Quality control of scanned books

fwiw, all the books uploaded by tpb were scanned by Google.
as for spam it is a challenge and we are dealing with it as we can. we remove many thousands of unwanted pages. if you post or email us urls for items you think are spam we do respond.

Reply to this post
Reply [edit]

Poster: xensyria Date: Jan 9, 2016 3:51pm
Forum: texts Subject: Re: Quality control of scanned books

Just trying to think of a way to remove the burden from you guys, so you can concentrate on the bigger picture stuff of making the site even greater!

Reducing the amount of curation employees have to do would surely also make the donations stretch further.

This post was modified by xensyria on 2016-01-09 23:51:18